Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the pipeline interface include path to sample yaml outputs? #61

Closed
nsheff opened this issue Oct 24, 2018 · 10 comments
Closed

Should the pipeline interface include path to sample yaml outputs? #61

nsheff opened this issue Oct 24, 2018 · 10 comments
Milestone

Comments

@nsheff
Copy link
Contributor

nsheff commented Oct 24, 2018

In #32 we build a pipeline interface section called summary_results, which records the location of summarizer results.

What about something similar to report the location of sample yaml file from the pipeline?

@stolarczyk
Copy link
Member

I'm not sure I understand. Could you give more context here?

is the goal here to set the future location of sample.yaml file in a pipeline interface and then use this path to save the file there instead of in a default spot?

@nsheff
Copy link
Contributor Author

nsheff commented Apr 3, 2020

well, this is a few years old... but yes I believe your interpretation is correct. I believe this is sort of accomplished by the output schema concept...

@stolarczyk
Copy link
Member

more by the input schema, I think. {sample_name}.yaml file consists of key-value pairs of all public sample attributes, so input schema is related. Yet it just defines the type of the attrs, not their values.

@stolarczyk
Copy link
Member

so what's the key for the sample yaml path in the pipeline interface, if we even want to proceed?

sample_yaml_path, sample_attrs_path, sample_path, sample_file_path, sample_file, sample_yaml?

@nsheff
Copy link
Contributor Author

nsheff commented Apr 3, 2020

Well, there can be an input sample.yaml, which is in some sense an instance of the object specified by the input schema, and an output sample.yaml which is in some sense an instance of the object specified by the output schema.

we could produce both yamls. right now we only produce the first.
there could be:

pipelines:
  pipeline:
    input_sample_yaml_path: {sample.sample_name}.yam
    output_sample_yaml_path: {sample.sample_name}_output.yaml

those are relative to the path specified in looper.output_dir, with the above values as defaults, but can be overridden the pipeline interface?

just brainstorming here...

@stolarczyk
Copy link
Member

stolarczyk commented Apr 3, 2020

I think it all makes sense.

So, in practice:
output sample yaml = input sample yaml + populated sample attrs defined in the pipeline output schema ?

@nsheff
Copy link
Contributor Author

nsheff commented Apr 3, 2020

output sample yaml = input sample yaml + populated sample attrs defined in the pipeline output schema ?

makes sense to me... I think it's a superset of the input yaml. the only reason we make the input yaml is because it's used as an input to the pipeline.

In fact... why even make the input yaml? if the output yaml is a superset of it, then it could be used as an input to the pipeline as well...

so now, with this model, there is only 1 sample.yaml, which is exactly what you say: input yaml + populated sample attrs defined in the output schema.

one question: would this yaml include a property given in the table table that is not specified in the input schema?

@stolarczyk
Copy link
Member

that's right, input sample.yaml is probably not necessary in such a case.

one question: would this yaml include a property given in the table table that is not specified in the input schema?

I'd say yes, I can imagine writing a small schema (for example with just required attrs), that does not necessarily cover all the sample attributes. Still I'd expect all my columns from sample table to be accessible in the yaml file.

@nsheff
Copy link
Contributor Author

nsheff commented Apr 3, 2020

I'd say yes,

I agree.

@stolarczyk
Copy link
Member

for future reference:

sample yaml file path is constructed from a template in pipeline interface file

pipelines:
  pipeline:
    sample_yaml_path: >
     {sample.sample_name}.yaml  # relative to looper.output_dir

if sample_yaml_path section is missing it is saved in submission directory in <sample_name>.yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants