Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Allow paramspace to separate filename params with custom separator #1299

Merged
merged 5 commits into from Apr 19, 2022

Conversation

kpj
Copy link
Contributor

@kpj kpj commented Dec 11, 2021

Description

This PR allows using Paramspace(df_params, filename_params="*", filename_sep="::") to generate paths of the form column1~{value1}::column2~{value2}.
This is useful when parameter names contain _ and we want to retrieve their values from path names in some rule (is there a better way of doing this?).

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

@kpj kpj requested a review from johanneskoester as a code owner Dec 11, 2021
@kpj kpj changed the title Allow paramspace to separate filename params with custom separator feat: Allow paramspace to separate filename params with custom separator Dec 11, 2021
@sonarcloud
Copy link

sonarcloud bot commented Dec 11, 2021

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@psteinb
Copy link
Contributor

psteinb commented Jan 11, 2022

This is is a wonderful PR. I was about to sit down and code it up myself as I need that facility to interface snakemake with hydra. Can't wait to see it land, @johanneskoester.

Thanks so much @kpj! 🙏

@kpj
Copy link
Contributor Author

kpj commented Mar 11, 2022

@johanneskoester Any thoughts on this? :-)

@johanneskoester
Copy link
Contributor

johanneskoester commented Apr 5, 2022

This is useful when parameter names contain _ and we want to retrieve their values from path names in some rule (is there a better way of doing this?).

I would just use the .instance attribute of the paramspace within the rule params for this, or am I missing something here?

Nevertheless, it does not hurt make this configurable.

@johanneskoester johanneskoester self-assigned this Apr 5, 2022
@kpj
Copy link
Contributor Author

kpj commented Apr 5, 2022

I would just use the .instance attribute of the paramspace within the rule params for this, or am I missing something here?

This is also what I do when a rule deals with a single parameter instance, but sometimes (e.g., when aggregating all individual results with expand("results/{params}/foo.csv", params=paramspace.instance_patterns) in a single rule), I want to access parameter values for each file as I process it.
Currently I am doing something really awkward like this (in R):

snakemake@input$fname_list %>%
  map_dfr(function(path) {
    path_parts <- gtools::split_path(path, depth_first = FALSE)
    param_str <- path_parts[[length(path_parts) - 1]]
    tmp <- list()
    for (param_pair in strsplit(param_str, "_")[[1]]) {
      parts <- strsplit(param_pair, "~")[[1]]
      tmp[parts[[1]]] <- parts[[2]]
    }
    read_csv(path) %>%
      mutate(!!!tmp)
  })

Nevertheless, it does not hurt make this configurable.

Agreed :-) My previous example is maybe somewhat obscure, but the filename_sep feature should be useful in general (see, e.g., #1299 (comment)).

@johanneskoester
Copy link
Contributor

johanneskoester commented Apr 5, 2022

Why not something like this?

rule aggregate:
    input:
        expand("results/{params}/foo.csv", params=paramspace.instance_patterns)
    params:
        instances=list(paramspace.dataframe.iterrows())
    ...

I would just use the .instance attribute of the paramspace within the rule params for this, or am I missing something here?

This is also what I do when a rule deals with a single parameter instance, but sometimes (e.g., when aggregating all individual results with expand("results/{params}/foo.csv", params=paramspace.instance_patterns) in a single rule), I want to access parameter values for each file as I process it. Currently I am doing something really awkward like this (in R):

snakemake@input$fname_list %>%
  map_dfr(function(path) {
    path_parts <- gtools::split_path(path, depth_first = FALSE)
    param_str <- path_parts[[length(path_parts) - 1]]
    tmp <- list()
    for (param_pair in strsplit(param_str, "_")[[1]]) {
      parts <- strsplit(param_pair, "~")[[1]]
      tmp[parts[[1]]] <- parts[[2]]
    }
    read_csv(path) %>%
      mutate(!!!tmp)
  })

Nevertheless, it does not hurt make this configurable.

Agreed :-) My previous example is maybe somewhat obscure, but the filename_sep feature should be useful in general (see, e.g., #1299 (comment)).

@kpj
Copy link
Contributor Author

kpj commented Apr 6, 2022

Why not something like this?

rule aggregate:
    input:
        expand("results/{params}/foo.csv", params=paramspace.instance_patterns)
    params:
        instances=list(paramspace.dataframe.iterrows())
    ...

That is a great point! I think I never considered this possibility because I did not want to rely on the seemingly implicit assumption that paramspace.instance_patterns and paramspace.dataframe.iterrows() have the same ordering.
I also sometimes have even weirder constructions which would require slightly more complicated constructions:

fname_list=expand(
    f"results/{paramspace.wildcard_pattern}/path/{{replicate}}/{{method}}/enrichment_result.csv",
    replicate=range(config["replicate_count"]),
    method=method_list,
    allow_missing=True,
  ),

But I could probably do something like this to solve the issue:

params:
  instances=[params for params in paramspace.dataframe.iterrows() for _ in range(config["replicate_count"]) for _ in method_list]

Now that I think about it, it's probably a much better solution. Thanks, I'll use this from now on :-)


Regarding merging this PR, is there anything else that I could do?

@johanneskoester johanneskoester merged commit 8236e80 into snakemake:main Apr 19, 2022
2 of 3 checks passed
@johanneskoester
Copy link
Contributor

johanneskoester commented Apr 19, 2022

Nope, thanks!

@sonarcloud
Copy link

sonarcloud bot commented Apr 19, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.8% 0.8% Duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants