Skip to content
Martin Chapman edited this page Nov 25, 2021 · 1 revision

Phenoflow structures its definition according to a standard model. This model is based upon the workflow paradigm[^1], which is designed to support reproducibility more widely.

model

Salient features of the Phenoflow model are:

  • The separation of a phenotype into a set of steps.

  • A full population of patient data is passed between the steps sequentially.

  • The information in each step can be separated into three conceptual layers:

    • Abstract:
      • name - the name of the step, usually a short ID summarising its functionality
      • description - a longer description of the functionality of the step
      • type - a classification of the step under a given ontology (examples below).
    • Functional
      • Inputs:
        • name - the name of the step input, usually a short ID summarising it.
        • description - a longer description of the input
      • Outputs:
        • name - the name of the step output, usually a short ID summarising it.
        • description - a longer description of the output
        • extension - the type of the data output from a step
    • Computational
      • Implementations:
        • language - the language of this implementation unit
        • path - a path to this implementation unit
  • Connecting the information in the model with an actual set of implementation units, one for each step, creates a complete computable phenotype.

  • The first step in a phenotype is a connector (types currently load or external), designed to extract data from a datasource without performing any processing on that data, and pass it to the second step.

  • The final step in a phenotype outputs results to disc (type output).

  • Other steps in a definition are designed to describe the logic of the phenotype (types currently logic and boolean).

  • Logic steps are further divided into case steps and exclusion steps.

    • case steps output a truth value for each patient in the population determining, based upon the piece of logic that step contains, whether they should be a member of the resulting cohort. The output of a case step does not affect any subsequent steps. As such, the steps are in a disjunction relationship, and only one must be true in order for a patient to be considered as having the condition being modelled.
    • exclusion steps determine whether an individual should be excluded based on not meeting certain criteria. Any case steps following a positive exclusive step cannot be marked as cases.
  • Branched logic is represented by 'flattening' each branch into a nested workflow (phenotype). The outputs from these workflows can thus be single truth values, referenced from individual steps.

    • Within a branch, for each member of the population, case steps must all be true in order for the branch value to be satisfied, and no exclusion steps must be satisfied.

These features prioritise both computable realisation and generality by having the description of the definition's logic (abstract and functional) separate from details of its implementation (computational), and the implementation itself, while retaining connectivity between the two; composability, by ensuring steps are distinct from each other, and, when steps need to be connected (i.e. they are in a branch), those steps are represented as a nested workflow that has a single overall truth value that can be output from a step; and clarity by ensuring multiple descriptions of the same logic are provided.

[^1]: The Phenoflow model might be described as an informal subset of CWL.

Clone this wiki locally