# Task Examples

Provided below are two examples of mortality prediction tasks that ACES could easily extract subject cohorts for. The configurations have been tested all the provided synthetic data in the repository (`../../../sample_data/`), as well as the MIMIC-IV dataset loaded using MEDS & ESGPT (with very minor changes to the below predicate definition). The configuration files for both of these tasks are provided in the repository (`../../../sample_configs`), and cohorts can be extracted using the `aces-cli` tool:

```bash
aces-cli data.path='/path/to/MIMIC/ESGPT/schema/' data.standard='esgpt' cohort_dir='../../../sample_configs' cohort_name='...'
```

For simplicity and consistency of these examples, we will use the following 4 window types and names. In practice, ACES supports arbitrary window types and window names, so you have full flexibility and control in how you define your windows in your task logic.

![Window Legend](../assets/windows.svg)

In [None]:
import json

import yaml
from bigtree import print_tree

from aces import config

In [None]:
config_path = "../../../sample_configs"

## In-hospital Mortality

The below timeline specifies a binary in-hospital mortality prediction task where we aim to predict whether the patient dies (label=`1`) or is discharged (label=`0`):

![In-hospital Mortality](../assets/inhospital_mortality.svg)

Suppose we'd like to use all patient data up to and including 24 hours past an admission. We can therefore define the `input` window as above. We can also place criteria on the windows to filter out cohort. In this case, we'd like to ensure there is sufficient prior input data for our model, so we place a constraint that there must be at least 5 or more records (ie., with unique timestamps) within `input.

Next, suppose we'd like to only include hospital admissions that were longer than 48 hours. To represent this clause, we can specify `gap` as above with a length of 48 hours (overlapping the initial 24 hours of `input`). If we then place constraints on `gap`, preventing it to have any discharge or death events, then the admission must then be at least 48 hours.

Finally, we specify `target`, which is our prediction horizon and lasts until the immediately next discharge or death event. This allows us to extract a cohort that includes both patients who have died and those who did not (ie., successfully discharged).

We can then specify a task configuration as below:

```yaml
predicates:
  admission:
    code: event_type//ADMISSION
  discharge:
    code: event_type//DISCHARGE
  death:
    code: event_type//DEATH
  discharge_or_death:
    expr: or(discharge, death)

trigger: admission

windows:
  input:
    start: NULL
    end: trigger + 24h
    start_inclusive: True
    end_inclusive: True
    has:
      _ANY_EVENT: (5, None)
    index_timestamp: end
  gap:
    start: trigger
    end: start + 48h
    start_inclusive: False
    end_inclusive: True
    has:
      admission: (None, 0)
      discharge: (None, 0)
      death: (None, 0)
  target:
    start: gap.end
    end: start -> discharge_or_death
    start_inclusive: False
    end_inclusive: True
    label: death
```

### Predicates

To capture our task definition, we must define at least three predicates. Recall that these predicates are dataset-specific, and thus may be different depending on the data standard used or data schema.

For starters, we are specifically interested in mortality "in the hospital". As such, an `admission` and a `discharge` predicate would be needed to represent events where patients are officially admitted "into" the hospital and where patients are officially discharged "out of" the hospital. We also need the `death` predicate to capture death events so we can accurately capture the mortality component. 

Since our task endpoints could be either `discharge` or `death` (ie., binary label prediction), we may also create a derived predicate `discharge_or_death` which is expressed by an `OR` relationship between `discharge` and `death`.

### Trigger

A prediction can be made for each event specified in `trigger`. This field must contain one of the previously defined dataset-specific predicates. In our case, we'd like to make a prediction of mortality for each valid admission in our cohort, and thus we set `trigger` to be the `admission` predicate. 

### Windows

The windows section contains the remaining three windows we defined previously - `input`, `gap`, and `target`.

`input` begins at the start of a patient's record (ie., `NULL`), and ends 24 hours past `trigger` (ie., `admission`). As we'd like to include the events specified at both the start and end of `input`, if present, we can set both `start_inclusive` and `end_inclusive` as `True`. Our constraint on the number of records is specified in `has` using the `_ANY_EVENT` predicate, with its value set to be greater or equal to 5 (ie., unbounded parameter on the right as seen in `(5, None)`). **Note**: Since we'd like to make a prediction at the end of `input`, we can set `index_timestamp` to be `end`, which corresponds to the timestamp of `trigger + 24h`.

`gap` also begins at `trigger`, and ends 48 hours after. As we have included included the left boundary event in `trigger` (ie., `admission`), it would be reasonable to not include it again as it should not play a role in `gap`. As such, we set `start_inclusive` to `False`. As we'd like our admission to be at least 48 hours long, we can place constraints specifying that there cannot be any `admission`, `discharge`, or `death` in `gap` (ie., right-bounded parameter at `0` as seen in `(None, 0)`).

`target` beings at the end of `gap`, and ends at the next discharge or death event (ie., `discharge_or_death` predicate). We can use this arrow notation which ACES recognizes as event references (ie., `->` and `<-`; see [Time Range Fields](https://eventstreamaces--39.org.readthedocs.build/en/39/configuration.html#time-range-fields)). In our case, we end `target` at the next `discharge_or_death`. Similarly, as we included the event at the end of `gap`, if any, already in `gap`, we can set `start_inclusive` to `False`. **Note**: Since we'd like to make a binary mortality prediction, we can extract the `death` predicate as a label from `target`, by specifying the `label` field to be `death`.

### Task Tree

ACES is then able to parse our configuration file and generate the below task tree that captures our task. You can see the relationships between nodes in the tree reflect that of the task timeline:

In [None]:
inhospital_mortality_cfg_path = f"{config_path}/inhospital_mortality.yaml"
cfg = config.TaskExtractorConfig.load(config_path=inhospital_mortality_cfg_path)
tree = cfg.window_tree
print_tree(tree)

## Imminent Mortality

The below timeline specifies a binary imminent mortality prediction task where we aim to predict whether the patient dies (label=`1`) or not (label=`0`) in the immediate 24 hours following a 2 hour period from any given time:


![Imminent Mortality](../assets/imminent_mortality.svg)

In this case, we'd like to use all patient data up to and including the triggers (ie., every event). However, as we won't be placing any constraints on this window, we actually do not need to add it into our task configuration, as ultimately, any and all data rows prior to the `trigger` timestamp will be included.

You can see that the `trigger` window essentially encapsulates the entire patient record. This is because we'd like to define a task to predict mortality at every single event in the record for simplicity. In practice, this might not be reasonable or feasible. For instance, you may only be interested in predicting imminent mortality within an admission. In this case, you might create `admission_window`, starting from `admission` predicates to `discharge_or_death` predicates. ACES would create a branch in the task tree from this window, and since the results ensure that all output rows satisfy all tree branches, the cohort would only include triggers on events in `admission_window`.

For this particular example, we create `gap` of 2 hours and `target` of 24 hours following `gap`. No specific constraints are set for either window, except for the time durations.

We can then specify a task configuration as below:

```yaml
predicates:
  death:
    code: event_type//DEATH

trigger: _ANY_EVENT

windows:
  gap:
    start: trigger
    end: start + 2 hours
    start_inclusive: True
    end_inclusive: True
    index_timestamp: end
  target:
    start: gap.end
    end: start + 24 hours
    start_inclusive: False
    end_inclusive: True
    label: death
```

### Predicates

Only a `death` predicate is required in this example to capture our `label` in `target`. However, as noted [here](https://eventstreamaces.readthedocs.io/en/latest/overview.html#special-predicates), certain special predicates can be used without explicit definition. In this case, we will make use of `_ANY_EVENT`.

### Trigger

A prediction can be made for each and every event. As such, `trigger` is set to the special predicate `_ANY_EVENT`.

### Windows

The windows section contains the two windows we defined - `gap` and `target`. In this case, the `gap` and `target` windows are defined relative to every single event (ie., `_ANY_EVENT`). `gap` begins at `trigger`, and ends 2 hours after. `target` beings at the end of `gap`, and ends 24 hours after. 

**Note**: Since we'd again like to make a binary mortality prediction, we can extract the `death` predicate as a label from `target`, by specifying the `label` field to be `death`. Additionally, since a prediction would be made at the end of each `gap`, we can set `index_timestamp` to be `end`, which corresponds to the timestamp of `_ANY_EVENT + 24h`.

### Task Tree

As in the in-hospital mortality case, ACES is able to parse our configuration file and generate a task tree:

In [None]:
imminent_mortality_cfg_path = f"{config_path}/imminent_mortality.yaml"
cfg = config.TaskExtractorConfig.load(config_path=imminent_mortality_cfg_path)
tree = cfg.window_tree
print_tree(tree)

## Other Examples

A few other examples are provided in `../../../sample_configs/` of the repository. We will continue to add task configurations to this folder or to a benchmarking effort for EHR representation learning. More information can be found [here](https://github.com/mmcdermott/PIE_MD/tree/main) - stay tuned!