Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify pipeline parameters #202

Open
WackerO opened this issue Nov 22, 2023 · 3 comments
Open

Simplify pipeline parameters #202

WackerO opened this issue Nov 22, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@WackerO
Copy link
Collaborator

WackerO commented Nov 22, 2023

Description of feature

From some of the recent issues (and also from my own experience) I think the multitude of semi-optional parameters in the pipeline is a problem.

Something like exploratory_assay_names is not marked as a required param in the nextflow schema, but effectively it is, as if the param is not correctly set, the pipeline will fail. Other params, like those for a GTF file (e.g. features_id_col), are expected even when the user does not input a GTF file.
The param differential_file_suffix should either be removed completely (as it depends on the input type of the dataset) or be made optional; the suffix should be determined automatically by the pipeline.

I'll see which parts I can simplify, I think this would help make it easier to run the pipeline, especially for users who don't know much about coding. If anyone has additional ideas, please post them here!

@WackerO WackerO added the enhancement New feature or request label Nov 22, 2023
@WackerO WackerO self-assigned this Nov 22, 2023
@olgabot
Copy link

olgabot commented Nov 27, 2023

Hi @WackerO -- could this issue be caused by exploratory_assay_names? #196

I don't see --exploratory_assay_names listed here in the documentation:

image

But I do see it in the main branch:

"exploratory_assay_names": {

@pinin4fjords
Copy link
Member

pinin4fjords commented Dec 14, 2023

Something like exploratory_assay_names is not marked as a required param in the nextflow schema, but effectively it is, as if the param is not correctly set, the pipeline will fail.

That's just an oversight in the schema definition, please feel free to fix. But we should probably put this option and the suffix one (and similar) in a new 'Advanced options' section. This IS a parameter, even if the users don't change it, and it's nice for advanced user to tweak the suffixes if they want.

You'll notice that exploratory_assay_names is a hidden parameter to discourage users from changing it (which is why you can't see it in the UI @olgabot - you shouldn't need to interact with it). We could do the same with others.

The alternative is hard-coding all these things into the workflow code, which don't think would be an improvement.

Other params, like those for a GTF file (e.g. features_id_col), are expected even when the user does not input a GTF file.

That's because it's required in the absence of a GTF file - it's required everywhere we need to cross reference matrix rows with feature annotation (which is lots of places). If there is a context in which the pipeline can work without this parameter, feel free to highlight it, and we can make the parameter optional (and add checks everywhere it's needed but not supplied).

@WackerO
Copy link
Collaborator Author

WackerO commented Jan 10, 2024

I'm so sorry, for some reason I was not notified of the activity in this issue and only now saw your responses!

Hi @WackerO -- could this issue be caused by exploratory_assay_names? #196

Hmm, I'm not entirely sure but don't think that the issue is related to the names...

You'll notice that exploratory_assay_names is a hidden parameter to discourage users from changing it (which is why you can't see it in the UI @olgabot - you shouldn't need to interact with it). We could do the same with others.

Aah, fair enough!

That's because it's required in the absence of a GTF file - it's required everywhere we need to cross reference matrix rows with feature annotation (which is lots of places). If there is a context in which the pipeline can work without this parameter, feel free to highlight it, and we can make the parameter optional (and add checks everywhere it's needed but not supplied).

You are indeed right! I think when I wrote that point, I was looking at the nextflow.schema description of the param which is Feature ID attribute in the GTF file (e.g. the gene_id field). I think this should be explained in a bit more detail, I'll think of some text and you can tell me what you think of it.

I'll start working on a PR to at least make the name_col params optional (so that by default, the respective id_col will be used instead of some hard-coded value like gene_name) and we can see what else can be simplified in that PR :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

3 participants