You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What if a pipeline had a sample representation in yaml format that had all of its attributes; both input and output attributes?
The pipeline itself would find this useful to refer to the sample paths and other things; the interface would find this useful as well (it could replace and extend the outputs section).
for example, the sample_structure.yaml file would be produced for each pipeline. It would look like:
So, it's totally flexible. Then, the pipeline would use this file (it accompanies the pipeline), and would provide: sample.cutadapt_report or sample.alignments.bt2_aligned, which would give the populated strings for the current sample (produced with str.format(*sample). This is superior to the current mode of defining sample subclasses with attributes because it is not tied to python, and...
It's useful also for the pipeline interface, which would add a new path:
Downstream tools that understand the pipeline interface (in any language) also now know about the sample structure. We would use this in place of the current outputs section. Would it be useful for anything else? An R package that wants to read in a peak file, for example. Right now, we're having to hard-code these kinds of outputs. for example:
the outputs solves this I suppose. This is just a more universal solution that would solve it also at the python level. A disadvantage of sticking it directly in the piface (like the current outputs approach) is that the pipeline itself can't use it...unless it became aware of the pipeline interface. But given that the piface is conceptually external to the pipeline, dividing these seems to add flexibility and make more sense.
This concept could be built into the peppy Sample object.
The text was updated successfully, but these errors were encountered:
Related to #201, #61
What if a pipeline had a sample representation in yaml format that had all of its attributes; both input and output attributes?
The pipeline itself would find this useful to refer to the sample paths and other things; the interface would find this useful as well (it could replace and extend the
outputs
section).for example, the
sample_structure.yaml
file would be produced for each pipeline. It would look like:or, if the pipeline author wants more structure:
So, it's totally flexible. Then, the pipeline would use this file (it accompanies the pipeline), and would provide:
sample.cutadapt_report
orsample.alignments.bt2_aligned
, which would give the populated strings for the current sample (produced withstr.format(*sample)
. This is superior to the current mode of defining sample subclasses with attributes because it is not tied to python, and...It's useful also for the pipeline interface, which would add a new path:
Downstream tools that understand the pipeline interface (in any language) also now know about the sample structure. We would use this in place of the current
outputs
section. Would it be useful for anything else? An R package that wants to read in a peak file, for example. Right now, we're having to hard-code these kinds of outputs. for example:https://github.com/databio/pepatac/blob/a0e4347b199c91bbfd7d994b0705da2ca8d51015/BiocProject/readPepatacPeakBeds.R#L11
the outputs solves this I suppose. This is just a more universal solution that would solve it also at the python level. A disadvantage of sticking it directly in the piface (like the current
outputs approach
) is that the pipeline itself can't use it...unless it became aware of the pipeline interface. But given that the piface is conceptually external to the pipeline, dividing these seems to add flexibility and make more sense.This concept could be built into the peppy
Sample
object.The text was updated successfully, but these errors were encountered: