Skip to content
This repository has been archived by the owner on Aug 2, 2018. It is now read-only.

Naming conventions for NXF scripts #9

Closed
apeltzer opened this issue Feb 23, 2018 · 16 comments
Closed

Naming conventions for NXF scripts #9

apeltzer opened this issue Feb 23, 2018 · 16 comments

Comments

@apeltzer
Copy link
Member

After the discussion in the NGI-RNASeq Gitter (credits to @rfenouil ), we might want to set some standards for naming e.g. channels/processes/variables. I opened this ticket to keep track of the ideas and we could maybe integrate these soon to make them mandatory once we agreed on defaults:

Some ideas:

  • channels start with a ch_ prefix (to hinder confusing them with variables)

Other suggestions/ideas for variable / process / ... naming conventions would be great too!

@maxulysse
Copy link
Member

maxulysse commented Feb 23, 2018

I'm just not comfortable with the underscore, as it seems not used much in groovy.
I do prefer camelCase or PascalCase over snake_case.
I guess it depends from the developer, and looking at some pipelines, there is some inconsistency in casing scheme.
And as I even introduce a snake_case myself in Sarek, I'm not very consistent myself...

I think the most important point for the channels is to refer to the process it will be inputed into.
That will help prevent collisions.

But yeah, something to differentiate plain variables and channels should be a good start.

For the process, I think it's expected that they begin with an uppercase, but apart from that, I think the name should be as descriptive as possible, I usually go with RunTool if it's for a single tool, or DoAction if it's more than one tool involved, or if the tool is used several times for different options.
But I still have some inconstistency in Sarek, as something we just have Tool as the process name.

@ewels
Copy link
Member

ewels commented Feb 25, 2018

I'm just not comfortable with the underscore, as it seems not used much in groovy.

I don't really care about this too much 😛 Here especially - the ch_ isn't part of the name, it's purely a prefix to denote channel, so I think it's nice to physically remove it with a separator to make reading easier. Happy to say use PascalCase for process names though 👍

the most important point for the channels is to refer to the process it will be inputed into

Agreed, that was a key point that's missing above. So ch_ prefix, fileDescripter middle then perhaps _forProcessName suffix? Danger here is that a list of channels could become very long I guess.

@apeltzer
Copy link
Member Author

I have to say that I personally enjoy the snake_case formatting more, as it separates the "type" in this case more clearly from the fileDescriptor and _forProcessName suffix.

I also don't know whether we want to make things like this mandatory or just suggest to people to follow these rules as a suggestion. My feeling is to make it mandatory since that would improve overall interoperability, but that might be too harsh in the short/mid-term.

@Hammarn
Copy link

Hammarn commented Feb 27, 2018

I really like the ide about being specific about the channel names etc. I.e. anding the ch_ prefix. To keep that consistent it's probably better to enforce snake_case. Even though I don't have a strong preference for either scheme.

@rfenouil
Copy link

rfenouil commented Feb 27, 2018

If that can be useful, please find attached the modified version of your RNA-seq pipeline which includes renaming of channels. It's mostly a stripped-down version so nothing crazy on the code, but that may illustrate how channels renaming improves readability.
It's just a simple renaming scheme, then it will be up to you to decide what format is better.

PS: "help message not up to date in this version"

main.zip

@apeltzer
Copy link
Member Author

I really like it that way - its easy to see, you know exactly what a channel "contains" and what it will be used for.

@sven1103
Copy link
Member

But then we should also be consistent with the process names as well, right?

For example:

process analysis_diffExpression:

would result in channel names like:

ch_myAwesomeFiles_for...

well, don't even know how to do it properly here.

Maybe: ch_fileDescriptor_[for/from]processName
-> ch_myAwesomeFiles_forAnalysisDiffExpression

Could be easily checked via a regex in the lint tests.

@sven1103
Copy link
Member

sven1103 commented Feb 27, 2018

we could stick to process names like: process MagicMatterAnalysis:

@andreas-wilm
Copy link

Here my two cents:

  • I think casing rules shouldn't be enforced, but we could make a recommendation (if only we could agree on one)
  • I like ch_ as prefix for channels. As Phil points out, the underscore just serves as a separator, i.e. the prefix is not part of the name
  • I do not like the idea that channels refer to the process they will be inputed to. This sounds redundant (you can just search for the channel name, just like you would need to search for the process name). Also any change in process name then requires multiple changes

Andreas

@sven1103
Copy link
Member

sven1103 commented Feb 28, 2018

I think casing rules shouldn't be enforced, but we could make a recommendation (if only we could agree on one)

I am with you here, you can however check it and print it as warning. Like PEP guidelines i.e., if, and only if, we can agree to a standard :)

I do not like the idea that channels refer to the process they will be inputed to. This sounds redundant (you can just search for the channel name, just like you would need to search for the process name). Also any change in process name then requires multiple changes

Maybe this issue points out, that we all have different taste and work practices (regarding the ideas here) and there is no strong "right" or "wrong". Maybe we can just agree to add the 'ch_' prefix for now and continue with CI and wf tests, which imho is more important than naming conventions ;)

@timdiels
Copy link

+1 for a recommendation that isn't enforced, or at least explain what not to do. Sometimes there is a more intuitive name that is unlikely to name clash yet does not include the process' name. My main reason for including the process name in the channel name is to avoid name clashes. But I also divide my main.nf into sections (orthofinder, blast2go, ...), prefixing related channels with an abbreviation of the section name instead of the process name, again to avoid name clashes.

@ewels
Copy link
Member

ewels commented Mar 21, 2018

@timdiels - surely this doesn't need to be mutually exclusive though? You can still do all of those things with an additional ch_ prefix? I think the process name thing was a recommendation only and wouldn't be checked.

@timdiels
Copy link

@ewels - I misunderstood, I thought the idea was to enforce something like ch_{process}_{channel} rather than ch_{anything}. I haven't used many variables which do not contain channels, so I can't really chime in on whether or not to enforce a ch_ prefix but I'm not against it.

@ewels
Copy link
Member

ewels commented Mar 21, 2018

No, on re-reading it's the discussion of two points I guess 👍 So yes, basically I agree with you :)

@baffelli
Copy link

baffelli commented Apr 4, 2018

I'm very late to the discussion, but I somehow developed my own style (I was unaware of this discussion) which turns out to be very similar to @sven1103 proposal above. I tend to name processes as mySpecialProcess and channels using mySpecialDataForMySpecialProcess.
I'm not too keen in appending ch to channel variables since the For should be sufficent to identify them. Likewise, I don't like mixing uderscores and camelCase too much, so I'de be against enforcing the use of underscores as separators.

@ewels
Copy link
Member

ewels commented Aug 2, 2018

This issue was moved to nf-core/tools#80

@ewels ewels closed this as completed Aug 2, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants