snakefmt comments parsing has unintended side-effects #85

dlaehnemann · 2020-11-18T18:07:06Z

I just tried to get snakefmt --check to work for snakemake-wrappers: snakemake/snakemake-wrappers#270

However, the wrappers sport a large variety of comments in the Snakefiles, with different arrangements that each make sense for different settings. snakefmt rearranges them all in a standardized way, but in the process messes up their arrangement, oftentimes severing the clear association between some rule keyword and the comment. Here's the commit with which I tried to revert and fix those wrong adjustments -- it provides a good sample of how people may want to use comments in Snakefiles:
snakemake/snakemake-wrappers@c82e430

The text was updated successfully, but these errors were encountered:

johanneskoester · 2020-11-18T19:14:52Z

Ok, I'll deactivate snakefmt on snakemake-wrappers again for now.

mbhall88 · 2020-11-19T01:03:02Z

Ok, so what you're saying is you would like

rule bedtools_sort_bed:
    input:
        in_file="a.bed", # an optional sort file can be set as genomefile by the variable genome or
         # as fasta index file by the variable faidx
        genome="dummy.genome",

to be left as is and not turned into

rule bedtools_sort_bed:
    input:
        in_file="a.bed",
        # an optional sort file can be set as genomefile by the variable genome or
        # as fasta index file by the variable faidx
        genome="dummy.genome",

This is also brings up something that has been low-key bugging me for a while. PEP8 specifies that inline comments should be separated from the statement by at least two spaces.

For example, both black and snakefmt correctly diff this following

-foo = "bar" # a comment
+foo = "bar"  # a comment

(i.e. the extra space between the statement and the comment). But snakefmt doesn't do the same inside a rule

# test.smk
rule foo:
    input:
        bar="file.txt", # comment
        # another comment

$ snakefmt -l 10 --diff test.smk
  rule foo:
      input:
          bar="file.txt", # comment
-         # another comment
+          # another comment
? +

This also makes me realise @dlaehnemann that I am having a little difficulty recreating your issue.

Could you please provide a concrete example we can use for testing purposes showing actual and expected results for a small snakefile?

johanneskoester · 2020-11-19T07:56:05Z

Here we go.

Correct:

rule bamtools_filter:
    input:
        "{sample}.bam",
    output:
        "filtered/{sample}.bam",
    params:
        # optional parameters
        tags = [ "NM:<4", "MQ:>=10" ],  # list of key:value pair strings
        min_size="-2000",
        max_size="2000",
        min_length="10",
        max_length="20",
        # to add more optional parameters (see bamtools filter --help):
        additional_params='-mapQuality ">=0" -isMapped "true"',
    log:
        "logs/bamtools/filtered/{sample}.log",
    wrapper:
        "master/bio/bamtools/filter"

snakefmt:

rule bamtools_filter:
    input:
        "{sample}.bam",
    output:
        "filtered/{sample}.bam",
    params:
        tags=["NM:<4", "MQ:>=10"], # optional parameters
         # list of key:value pair strings
        min_size="-2000",
        max_size="2000",
        min_length="10",
        max_length="20", # to add more optional parameters (see bamtools filter --help):
        additional_params='-mapQuality ">=0" -isMapped "true"',
    log:
        "logs/bamtools/filtered/{sample}.log",
    wrapper:
        "master/bio/bamtools/filter"

as can be seen, the single line comments that should refer to the lines below are moved to the lines above, which changes their semantics.

dlaehnemann · 2020-11-19T10:16:53Z

Yeah, sorry for the unspecific dump, yesterday. I had spent hours fixing up that pull request, so simply wanted to note the issue without spending more time on it. So I am also sorry for the brisk tone of that issue...

@johanneskoester provided the most important recurring issue with the comment formatting, thanks for this more constructive approach!
I'll go through the commit again, now, and provide a few more comment use cases which I think make sense and should not be changed.

dlaehnemann · 2020-11-19T10:21:09Z

For this one, I am actually not 100% sure if this is feasible with a linter, as it goes against the PEP you mentioned above. But I nevertheless want to document it here:

Wanted layout:

rule chm_eval_kit:                           # [hide]
    output:                                  # [hide]
        directory("resources/chm-eval-kit"), # [hide]
    params:                                  # [hide]
        tag="v0.5",                          # [hide]
        version="20180222",                  # [hide]
    log:                                     # [hide]
        "logs/chm-eval-kit.log",             # [hide]
    cache: True                              # [hide]
    wrapper:                                 # [hide]
        "master/bio/benchmark/chm-eval-kit"  # [hide]
                                             # [hide]

current snakefmt

rule chm_eval_kit: # [hide]
    output: # [hide]
        directory("resources/chm-eval-kit"), # [hide]
    params: # [hide]
        tag="v0.5", # [hide]
        version="20180222", # [hide]
    log: # [hide]
        "logs/chm-eval-kit.log", # [hide]
    cache: True # [hide]
    wrapper: # [hide]
        "master/bio/benchmark/chm-eval-kit" # [hide]
         # [hide]

dlaehnemann · 2020-11-19T10:28:55Z

Here is an example, where current snakefmt even breaks the rule syntax (by commenting out the thread: entry's value when pulling up the second row with a comment):

wanted layout

    threads:  # Samtools takes additional threads through its option -@
        8     # This value - 1 will be sent to -@

current snakefmt

    threads: # Samtools takes additional threads through its option -@ 8 # This value - 1 will be sent to -@

If reformatting this would be wanted, an alternative formatting could be:

    threads: 8 # Samtools takes additional threads through its option -@; This value - 1 will be sent to -@

dlaehnemann · 2020-11-19T10:32:46Z

Here, snakefmt thinks that the comments with a completely different indentation level do not belong to the indentation level they are at (and the resources spec that follows), but to the preceding inline comment.

wanted format

    params:
        extra="", # optional
        java_opts="", # optional
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024,

current snakefmt

    params:
        extra="", # optional
        java_opts="", # optional
         # optional specification of memory usage of the JVM that snakemake will respect with global
         # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
         # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
         # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024,

dlaehnemann · 2020-11-19T10:35:10Z

Similar to the previous example, but with commented out extra params, which is a good way to showcase possible further extra params in an example rule for a Snakemake wrapper. Here, snakefmt adds an extra whitespace that does not make sense for the use-case.

wanted format

    params:
        extra="", # optional
        java_opts="", # optional
        #spark_runner="",  # optional, local by default
        #spark_master="",  # optional
        #spark_extra="", # optional

current snakefmt

    params:
        extra="", # optional
        java_opts="", # optional
         #spark_runner="",  # optional, local by default
         #spark_master="",  # optional
         #spark_extra="", # optional

dlaehnemann · 2020-11-19T10:44:05Z

@mbhall88 I think this list now gives a representative example for all the comment use cases I had to fix manually after running snakefmt on the snakemake-wrappers to avoid errors and wrong associations. I hope these are usable as test cases in fine-tuning snakefmt, even though I reduced them to the rule parts that are necessary to showcase the respective issue. But the rest of the rule could always be something generic.

bricoletc · 2020-11-19T11:06:50Z

Thank you for the enormous detective work @dlaehnemann and for the example @johanneskoester . We can now work with these concrete examples. we had clearly not given comment treatment a hard enough look, let's work on that now!

dlaehnemann · 2020-11-19T11:12:40Z

Good to know, that these unexpected three hours yesterday were useful after all! :D
And I guess snakemake wrappers was simply a very large and diverse collection of all kinds of comment use cases out there...

And thanks for providing this, this will be incredibly useful for the snakemake wrappers and snakemake workflows projects, catching lots of unwanted stuff automatically in pull requests! So time invested here is definitely worth it in the long run!

mbhall88 · 2020-11-20T01:26:03Z

Thanks so much for this!! I'm amazed we've gotten this far without realising this.

I guess the best approach is to leave comments as they are - unless they have less than 2-space from inline statement, in which case they will just have spacing added.

dlaehnemann · 2020-11-20T08:50:41Z

Yeah, to start with, that is probably the easiest and cleanest solution.

Should we at some point have any strong opinions about how comments should be formatted, we should use the above test cases to check that it keeps (most of) them as intended. But any such solution will probably have to reason about comment indentations (as in the # [hide] comments example and about alignment of comments with other content (as in the Java memory management comment example).

mbhall88 · 2020-11-23T01:50:46Z

Should we at some point have any strong opinions about how comments should be formatted, we should use the above test cases to check that it keeps (most of) them as intended. But any such solution will probably have to reason about comment indentations (as in the # [hide] comments example and about alignment of comments with other content (as in the Java memory management comment example).

I think we should just try and stick to what PEP8/black specifies? Which would appease both of the examples you mention I think.

bricoletc · 2020-12-01T11:19:40Z

For this one, I am actually not 100% sure if this is feasible with a linter, as it goes against the PEP you mentioned above. But I nevertheless want to document it here:

Wanted layout:

rule chm_eval_kit:                           # [hide]
    output:                                  # [hide]
        directory("resources/chm-eval-kit"), # [hide]
    params:                                  # [hide]
        tag="v0.5",                          # [hide]
        version="20180222",                  # [hide]
    log:                                     # [hide]
        "logs/chm-eval-kit.log",             # [hide]
    cache: True                              # [hide]
    wrapper:                                 # [hide]
        "master/bio/benchmark/chm-eval-kit"  # [hide]
                                             # [hide]

current snakefmt

rule chm_eval_kit: # [hide]
    output: # [hide]
        directory("resources/chm-eval-kit"), # [hide]
    params: # [hide]
        tag="v0.5", # [hide]
        version="20180222", # [hide]
    log: # [hide]
        "logs/chm-eval-kit.log", # [hide]
    cache: True # [hide]
    wrapper: # [hide]
        "master/bio/benchmark/chm-eval-kit" # [hide]
         # [hide]

Coming back to this, I don't think it is trivial to leave the comment alignment.
The problem is that we rely on python's tokenize library to process the text, and the tokenizer detects comments but ignores the preceding whitespace.

In fact for this situation black reformat like snakefmt does:

class p:          # [hide]
    elem1 = 2     # [hide]
    elem2 = 222   # [hide]

becomes

class p:  # [hide]
    elem1 = 2  # [hide]
    elem2 = 222  # [hide]

I would argue we should probably stick with that

dlaehnemann · 2020-12-01T12:10:25Z

Jup, that's what I figured and why I added the caveat when posting this. It's a shame, because the alignment does make it a lot more readable, but for the formatter to see this and then keep it that way does sound to complicated to implement quickly. So I agree that this is not worth investing lots of energy.

Maybe one workaround for such examples would be to include black's comment-based tag that turns off formatting for a certain region (# fmt off and # fmt on):
https://github.com/psf/black#the-black-code-style

We will just have to remember to include this in the respective snakemake-wrappers before applying snakefmt again. Let me know once you're done with updating the comments handling, and I'd be glad to work on a new snakemake-wrappers PR to implement snakefmt over there, and will try to check up on all my lessons learned from last time around... :D

bricoletc · 2020-12-01T12:19:08Z

Adding #fmt off/on flags is an interesting idea! It would need to be a new snakefmt feature as we have our own parser to deal with snakemake syntax. Would you like to raise that as an issue @dlaehnemann ? As it is your idea; else I can.

I'm working on all the other failing test cases you have introduced, will let you know when those are done :) its so helpful to have you testing this out on all these real-world files.

dlaehnemann · 2020-12-01T12:54:49Z

Adding #fmt off/on flags is an interesting idea! It would need to be a new snakefmt feature as we have our own parser to deal with snakemake syntax. Would you like to raise that as an issue @dlaehnemann ? As it is your idea; else I can.

Done, see issue #86 .

I'm working on all the other failing test cases you have introduced, will let you know when those are done :) its so helpful to have you testing this out on all these real-world files.

Sounds like a fun game of whack-a-mole. Enjoy, I guess?

* Add PEP8 2-space inline comment spacing, with unit test * Remove leading space in multiple comments in parameters (snakemake#85) * Add 4 failing test cases taken from snakemake#85

bricoletc · 2020-12-03T09:03:25Z

Here is an example, where current snakefmt even breaks the rule syntax (by commenting out the thread: entry's value when pulling up the second row with a comment):

wanted layout
    threads:  # Samtools takes additional threads through its option -@
        8     # This value - 1 will be sent to -@
current snakefmt
    threads: # Samtools takes additional threads through its option -@ 8 # This value - 1 will be sent to -@
If reformatting this would be wanted, an alternative formatting could be:
    threads: 8 # Samtools takes additional threads through its option -@; This value - 1 will be sent to -@

I'd like to propose an alternative solution:

Given:

rule all:
    # Comm0
    # Comm1
    threads: # Comm2
        8  # Comm3

include: # Comm1
    t

Produce:

rule all:
    # Comm0
    # Comm1
    # Comm2
    threads: 8  # Comm3


# Comm1
include: t

Given the inline comment documents the keyword (eg here # Comm2 documents threads) I think it's reasonable to put it above for keywords that get formatted inline, like include and threads. One caveat is # Comm0 and # Comm1 here could be documenting rule all not threads, but then the user can merge # Comm2 and #Comm3 themselves; I'd rather not force comment merging.

Would you agree @dlaehnemann ?

dlaehnemann · 2020-12-03T09:28:51Z

I definitely agree, that comment merging is difficult and thus I expect it to be error-prone. But moving each comment up one line could be similarly problematic. As you are pointing out, Comm0 and Comm1 could be referring to the rule, and then Comm2 would be erroneously aligned with them. Similarly, Comm2 and Comm3 could be intended as a block comment with continuous text and breaking their alignment could lead to confusion. So with the given variety of commenting styles out there, reasoning about these in a formatter and trying to systematically change them seems difficult.

An alternative could be to clearly define an intended Snakemake commenting style and document this everywhere (docs of Snakemake, snakemake-wrappers, snakemake-workflows, etc.). We would then have to go through existing projects once, to implement this style there. But even then, I think having the formatter change comment layout (beyond what black would do) is probably going to be problematic.

bricoletc · 2020-12-03T11:19:16Z

Similarly, Comm2 and Comm3 could be intended as a block comment with continuous text and breaking their alignment could lead to confusion.

Could you expand on what you mean by this @dlaehnemann ?
And I think I see your points, but for this specific problem what do you reckon we should do, merge the comments?

Perhaps @johanneskoester has a guiding opinion?

bricoletc · 2020-12-03T11:27:08Z

Also I'm not sure there exists a valid Python statement where you have

statement : #comment 
    statement #comment

(Possibly because Python statements like class or if, that are followed by :, expect a newline before next statement)
So not sure what black would do anyway...seems like this could be uncharted territory.

dlaehnemann · 2020-12-03T11:45:15Z

Yeah, that was not really clear. Here's an example:

    threads:  # this is a very, very, very long and
        8  # useless block comment about threads

I also came up with much more extreme case, but I think we won't be able to define a universally useful behaviour for all the comment types out there.

But I double-checked on the general python PEP 8 on comments, and maybe we could simply follow the very basic guidance there:
https://www.python.org/dev/peps/pep-0008/#comments

This would mean that:

block comments are always assumed to precede whatever they are describing
inline comments (i.e. a # comment after some code in the same line) are always assumed to describe that line and should have at least two spaces after the code (this does imply, that we could leave comments with more indentation wherever they are...)

And then the formatter could either emit warnings, if it finds funny stuff that doesn't seem to conform to this, or changes this according to the above assumptions. So e.g.:

1    threads:
2        8
3        # some weirdly indented comment, could be about the 8 above or the shell directive below
4    shell:
5        "do_something --arg1 test"

This would throw Warning: Block comment in line 3 not properly aligned with following 'shell' keyword, which it is expected to describe. For python comment formatting, see: https://www.python.org/dev/peps/pep-0008/#comments. Or, in conforming to the PEP 8, the formatter would adjust the indentation to that of shell:.

bricoletc · 2020-12-03T12:27:07Z

I see what you mean. For the example on the threads block comment, I think we have to assume that what comes after the 8 is the inline comment associated with threads, and place the other comment above threads. If there are yet more comments above threads, they are assumed to be also block comments describing threads. I'd happily issue a warning that such a reorganisation is happening.

For your second example I agree issuing a warning would also make sense- i wouldn't be keen on changing the comment indent though. given how the parser is set up its a lot of work to output all the information (eg it is expected to describe shell), I could at least add a warning referring to PEP8!

dlaehnemann · 2020-12-03T14:49:34Z

So for a keyword like threads: (with the colon) followed directly by a comment, you would assume that this belongs above the keyword? I guess that should usually make sense, but it does mean changing what the user specified as an inline comment to a block comment. So yes, maybe issue a warning whenever that happens.

My second example, with the weird indentation, is actually a safe bet to adjust the indentation, based on PEP 8. Maybe also issue a warning that this was done. But no need to include too much info beyond the comment itself. If a line number can be issued, that's helpful, but otherwise the text of the comment itself should do -- this at least enables a quick search by the user.

One change I would actually suggest, is to only adjust the spacing of inline comments, if they are below the two whitespaces distance from the code. This way, aligned inline comments across multiple lines would stay untouched and the formatting would nevertheless conform to PEP 8. Also, this would remove the need of introducing the on/off tags suggested in issue #86 .

bricoletc · 2020-12-03T17:57:23Z

So for a keyword like threads: (with the colon) followed directly by a comment, you would assume that this belongs above the keyword?

I would only place it above the keyword if it occurs before the value: so

threads: 8  # comment

The inline comment stays, but

threads: #comment 
    8

The comment goes above, with a warning

My second example, with the weird indentation, is actually a safe bet to adjust the indentation, based on PEP 8.

I would prefer issuing a warning only; PEP says Block comments generally apply to some (or all) code that follows them, ; generally applies seems like a soft requirement.
In addition, black does this:

Before

class p:
    # Comment
    elem1 = 2
    # Woopsy
# Woopsy2
class a:
    pass

class p:
    # Comment
    elem1 = 2
    # Woopsy


# Woopsy2
class a:
    pass

One change I would actually suggest, is to only adjust the spacing of inline comments,

Again I would prefer to stick to what black does there. Do you think that people often line up comments like the [hide] example you spotted (what does that hide mean actually? :p)

Sorry for being pedantic. Also at some level this comes down to opinions..

bricoletc · 2020-12-04T18:54:05Z

Branch dev now has new functionality which I believe addresses all the breaking examples provided in this issue.

@dlaehnemann would you like to run tip of dev on the snakemake-wrappers files and check if things look good now? If that's too traumatic given the issues that came up last time, I can give it a go, though you might spot more clearly what's improved since last time. Amyway, I think progress has definitely been made!

dlaehnemann · 2020-12-04T23:12:58Z

Will do, I'll try it out ASAP. And simply report back here, if anything fishy still turns up, instead of fixing for hours... ;)

* Add PEP8 2-space inline comment spacing, with unit test * Remove leading space in multiple comments in parameters (snakemake#85) * Add 4 failing test cases taken from snakemake#85

bricoletc · 2020-12-15T12:02:23Z

Hey @dlaehnemann any news on this? I've run it through a few of the wrappers and was broadly satisfied. I could push out a new release and rework any incoming issues as they come later on?

dlaehnemann · 2020-12-15T12:06:47Z

Sorry @bricoletc , this got lost on the todo list. But I should find time at the end of the week.

If you're generally satisfied with the results on some wrappers, a release would be nice -- this way we could switch snakefmt on for the whole snakemake wrappers repo immediately, if changing the formatting works as we want it. Otherwise, we'll just have to iterate to another release. So a release would be welcome!

bricoletc · 2020-12-15T13:52:24Z

No problem! OK, release 0.2.5 is made 🚀

mbhall88 added the bug Something isn't working label Nov 19, 2020

dlaehnemann changed the title ~~snakefmt rearranges comments, often messing them up~~ snakefmt comments parsing has unintended side-effects Nov 19, 2020

dlaehnemann mentioned this issue Nov 30, 2020

fix snakefmt formatting after it was introduced in linting snakemake/snakemake-wrappers#270

Closed

dlaehnemann mentioned this issue Dec 1, 2020

introduce # fmt: off and # fmt: on tags to skip formatting within files #86

Open

bricoletc self-assigned this Dec 4, 2020

bricoletc mentioned this issue Dec 4, 2020

Better comment formatting #87

Merged

bricoletc closed this as completed Jul 21, 2021

snakefmt comments parsing has unintended side-effects #85

snakefmt comments parsing has unintended side-effects #85

Comments

dlaehnemann commented Nov 18, 2020

johanneskoester commented Nov 18, 2020

mbhall88 commented Nov 19, 2020

johanneskoester commented Nov 19, 2020

Correct:

snakefmt:

dlaehnemann commented Nov 19, 2020

dlaehnemann commented Nov 19, 2020

Wanted layout:

current snakefmt

dlaehnemann commented Nov 19, 2020

wanted layout

current snakefmt

dlaehnemann commented Nov 19, 2020

wanted format

current snakefmt

dlaehnemann commented Nov 19, 2020

wanted format

current snakefmt

dlaehnemann commented Nov 19, 2020

bricoletc commented Nov 19, 2020

dlaehnemann commented Nov 19, 2020

mbhall88 commented Nov 20, 2020

dlaehnemann commented Nov 20, 2020

mbhall88 commented Nov 23, 2020

bricoletc commented Dec 1, 2020

Wanted layout:

current snakefmt

dlaehnemann commented Dec 1, 2020

bricoletc commented Dec 1, 2020

dlaehnemann commented Dec 1, 2020 • edited Loading

bricoletc commented Dec 3, 2020

wanted layout

current snakefmt

dlaehnemann commented Dec 3, 2020

bricoletc commented Dec 3, 2020 • edited Loading

bricoletc commented Dec 3, 2020

dlaehnemann commented Dec 3, 2020

bricoletc commented Dec 3, 2020

dlaehnemann commented Dec 3, 2020

bricoletc commented Dec 3, 2020 • edited Loading

bricoletc commented Dec 4, 2020

dlaehnemann commented Dec 4, 2020

bricoletc commented Dec 15, 2020

dlaehnemann commented Dec 15, 2020

bricoletc commented Dec 15, 2020 • edited Loading

dlaehnemann commented Dec 1, 2020 •

edited

Loading

bricoletc commented Dec 3, 2020 •

edited

Loading

bricoletc commented Dec 3, 2020 •

edited

Loading

bricoletc commented Dec 15, 2020 •

edited

Loading